Skip to content

Fix attached CSV being shadowed by same-named entity list#6908

Open
grzesiek2010 wants to merge 4 commits into
getodk:masterfrom
grzesiek2010:COLLECT-6425
Open

Fix attached CSV being shadowed by same-named entity list#6908
grzesiek2010 wants to merge 4 commits into
getodk:masterfrom
grzesiek2010:COLLECT-6425

Conversation

@grzesiek2010

@grzesiek2010 grzesiek2010 commented Sep 29, 2025

Copy link
Copy Markdown
Member

Closes #6425

Why is this the best possible solution? Were any other approaches considered?

The core problem is that there is no authoritative "entity list vs. plain CSV" signal that survives to parse time. The only authoritative signal (type=entityList in the OpenRosa manifest) exists at
download time. At parse time, both look identical with the same jr://file-csv/<name>.csv reference, same on-disk CSV - so the fix has to manufacture a reliable discriminator that lives until parsing.

Chosen approach:

Two changes that together make file presence in the form's media directory the discriminator between a plain CSV and an entity list:

  1. isSupported defers to the attached file. LocalEntitiesInstanceProvider now returns false when the instance's src resolves to a media file that exists on disk, letting JavaRosa serve the attached CSV;
    Otherwise, it serves the entity list from the DB.

    if (mediaFileRepository.get(instanceSrc)?.exists() == true) {                                                                                                                                                         
        return false                                                                                                                                                                                                      
    }                                                                                                                                                                                                                     
    return createDataAdapter().supportsInstance(instanceId)                                                                                                                                                               
  2. Entity-list seeds are no longer persisted. After import into the DB, the temp CSV is deleted (tempMediaFile.delete()). This is safe because every entity read is DB-backed, so the seed file is redundant - and with it gone, a CSV in the media dir can only be a genuine attachment.

Other approaches considered:

  • Compare the attachment hash to the entity list's stored hash, if it equals it is an entity file - rejected: the entity list's manifest hash is derived from the dataset's last-updated timestamp, not the CSV content, so it never reliably matches a file's content hash.
  • Derive entity-ness from the form definition (in JavaRosa) - not possible: isSupported only gets the instance id and raw src, which is identical for a CSV and an entity list; JavaRosa has no entity awareness.
  • A per-form marker file/database listing entity-list attachments - a workable fallback, only needed if seeds had to stay on disk. Once we confirmed entity reads are fully DB-backed, file presence alone is a cleaner
    discriminator.

How does this change affect users? Describe intentional changes to behavior and behavior that could have accidentally been affected by code changes. In other words, what are the regression risks?

When a form attaches a static CSV whose name matches an entity list in the project, the attached CSV is now used instead of the entity list (#6425). Genuine entity-list forms are unaffected and keep loading from the DB.

The change touches how external instances are resolved and stops persisting entity-list seed CSVs in the media directory, so testing should focus on entity data loading being correct across the different form scenarios:

  • forms using entity lists only,
  • forms using entity lists + attached CSVs (including same-named collisions),
  • plain forms with attached CSVs.

Do we need any specific form for testing your changes? If so, please attach one.

The one linked in the issue for reproducing the bug.

Does this change require updates to documentation? If so, please file an issue here and include the link below.

No.

Before submitting this PR, please make sure you have:

  • added or modified tests for any new or changed behavior
  • run ./gradlew connectedAndroidTest (or ./gradlew testLab) and confirmed all checks still pass
  • added a comment above any new strings describing it for translators
  • added any new strings with date formatting to DateFormatsTest
  • verified that any code or assets from external sources are properly credited in comments and/or in the about file.
  • verified that any new UI elements use theme colors. UI Components Style guidelines

@grzesiek2010

grzesiek2010 commented Sep 29, 2025

Copy link
Copy Markdown
Member Author

At first, the issue seemed pretty straightforward based on the description, but it turned out to be more complex.

In Javarosa, choices are loaded without any awareness of entities - it treats all choices the same way. I believe that was intentional, and we don’t want to change this by passing the usesEntities parameter into Javarosa.

In Collect, we have LocalEntitiesInstanceProvider, which Javarosa uses to load entities (instead of choices from a static CSV file) when a matching dataset exists (as described in the issue). However, at that point we don’t have any details about the form (whether it uses entities or not) that could help us decide whether LocalEntitiesInstanceProvider#isSupported should return true or false.

Given that, the simplest solution I can think of is to introduce something similar to what we did for enabling/disabling entities in the settings. There, we had a function that determined whether LocalEntitiesInstanceProvider should be added. The difference now would be that instead of checking the settings (like before), the function would check the currently loaded form.

What do you think about this approach @seadowg?

@seadowg seadowg left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've realized that I don't think this entirely solves the problem sadly: you could still have a form that uses an entity list, but also has a non entity list CSV that clashes with an entity list in the project. That's definitely an edge case, but I think the flow in the issue is as well.

I think to fully solve this we really need to be able to have access to information about the media file we're referencing (through a MediaFileRepository) for example. This is something that we've talked about on and off, but it's always been hard to justify. Maybe we should bite the bullet here? This could be a good issue to drive that out.

@grzesiek2010 grzesiek2010 changed the title Enable LocalEntitiesInstanceProvider only if a form uses entities Fix attached CSV being shadowed by same-named entity list Jun 10, 2026
@grzesiek2010

Copy link
Copy Markdown
Member Author

@seadowg After a long time, I got back to this PR and ended up using a different fix (based on your comment). Feel free to review it.

@grzesiek2010 grzesiek2010 requested a review from seadowg June 10, 2026 09:29
@grzesiek2010 grzesiek2010 marked this pull request as ready for review June 10, 2026 09:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

When a static CSV with the same name as an Entity List in the project is attached to a form, the Entity List is always used

2 participants